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A globally convergent matricial algorithm for 
multivariate spectral estimation 

Federico Ramponi, Augusto Ferrante and Michele Pavon 

Abstract 

In this paper, we first describe a matricial Newton-type algorithm designed to solve the multivariable spectrum approximation 
problem. We then prove its global convergence. Finally, we apply this approximation procedure to multivariate spectral estimation, 
and test its effectiveness through simulation. Simulation shows that, in the case of short observation records, this method may 
provide a valid alternative to standard multivariable identification techniques such as MATLAB's PEM and MATLAB's N4SID. 

Index Terms 

Multivariable spectrum approximation, Hellinger distance, convex optimization, matricial Newton algorithm, global conver- 
gence, spectral estimation. 

I. Introduction 

ARMA identification methods usually lead to nonconvex optimization problems for which global convergence is not 
guaranteed, cf. e.g. ||33]| . |[39l , [40], 111]. Although these algorithms are simple and perform effectively, as observed in 
BOl p.l03], 132. Section 1], no theoretically satisfactory approach to ARMA parameter estimation appears to be available. 
Alternative, convex optimization approaches have been recently proposed by Byrnes, Georgiou, Lindquist and co-workers [3], 
ll30l in the frame of a broad research effort on analytic interpolation with degree contraint, see fTl, ||2l, H, lISJ, 161, 0, JFl- 
lfT3l . ifTSl . lfT9ll . Il20l . Ell, Ea, El, El, E3, E3, ESI, \M and references therein. In particular, IS) describes a new setting 
for spectral estimation. This so-called THREE algorithm appears to allow for higher resolution in prescribed frequency bands 
and to be particularly suitable in case of short observation records. It effectively detects spectral lines and steep variations (see 
||36J for a recent biomedical application). An outline of this method is as follows. A given realization of a stochastic process 
(a finite collection of data Ui-.-Un) is fed to a suitably structured bank of filters, and the steady-state covariance matrix of the 
resulting output is estimated by statistical methods. Only zeroth-order covariance lags of the output of the filters need to be 
estimated, ensuring statistical robustness of the method. Finding now an input process whose rational spectrum is compatible 
with the estimated covariance poses naturally a Nevanlinna-Pick interpolation problem with bounded degree. The solution of 
this interpolation problem is considered as a mean of estimating the spectrum. A particular case described in the paper is the 
maximum differential entropy spectrum estimate, which amounts to the so-called central solution in the Nevanlinna-Pick theory. 
More generally, the scheme allows for a non constant a priori estimate of the spectrum. The Byrnes-Georgiou-Lindquist 
school has shown how this and other important problems of control theory may be advantageously cast in the frame of convex 
optimization. These problems admit a finite dimensional dual (multipliers are matrices!) that can be shown to be solvable. The 
latter result, due to Byrnes and Lindquist |8| (see also lfT6l ) is, however, nontrivial since the optimization occurs on an open, 
unbounded set of Hermitian matrices. The numerical solution of the dual problem is also challenging f9l, flTl, f35 |, since the 
gradient of the dual functional tends to infinity at the boundary of the feasible set. Finally, reparametrization of the problem 
may lead to loss of global concavity, see the discussion in [28, Section VII]. 

This paper adds to this effort in that we consider estimation of a multivariate spectral density in the spirit of THREE f9l, 
but employing a different metric for the optimization part, namely the Hellinger distance as in [17J . In papers [6J, [7|, Byrnes, 
Gusev and Lindquist chose the Kullback-Leibler divergence as a frequency weighted entropy measure, thus introducing a 
broad generalization of Burg's maximum entropy method. More recently, this motivation was supported by the well-known 
connection with prediction error methods, see e.g. BTIl . Il32ll . In the multivariable case, a Kullback-Leibler pseudodistance may 
also be readily defined ll24l inspired by the von Neumann's relative entropy ll44l . ||43| of statistical quantum mechanics. The 
resulting spectiTim approximation problem, however, leads to computable solutions of bounded McMillan degree only in the 
case when the prior spectral density is the identity matrix [24,1 , [17J (maximum entropy solution). On the contrary, with a 
suitable extension of the scalar Hellinger distance introduced in ifTTll . the Hellinger approximation generalizes nicely to the 
multivariable case for any prior estimate ^ of the spectrum. 

The main contributions of this paper, after some background material in Sections II-IV, are found in Sections V-VIII. In 
Section V, we establish strong convexity and smoothness of the dual functional on a certain domain of Hermitian matrices. 
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In Section VI, we analyze in detail a variant of a Newton-type matricial iteration designed to numerically solve the dual 
of the multivariable spectrum approximation problem. It had originally been sketched in |17|. The computational burden is 
dramatically reduced by systematically resorting to solutions of Lyapunov and Riccati equations thanks to various nontrivial 
results of spectral factorization. We then show in Section VII that the algorithm is globally convergent. Finally, in Section VIII, 
we present guidelines for its application to multivariate spectral estimation and present some simulations comparing to existing 
methods. Simulation in the multivariable case shows that, at the price of some moderate extra complexity in the model, our 
method may perform much better than MATLAB's PEM and MATLAB's N4SID in the case of a short observation record. 



II. Constrained spectrum approximation 

Paper ifTSl introduces and solves the following moment problem: Given a bank of filters described by an input-to-state 
stable transfer function G{z) = {zl — A)^^B and a state covariance matrix E, give necessary and sufficient conditions for the 
existence of input spectra <I>(ej'') such that the steady state output has variance S, that is, 

G$G* = S. (1) 

Moreover, parametrize the set of all such spectra (here, and in the sequel, integration takes place on the unit circle T with 
respect to normalized Lebesgue measure d'd/2'K) . Throughout this paper we use the following notations: A* — A^ for matrices 
and G* = G*{z) = G^ (z^^) for spectra and transfer functions. The scalar product between square matrices is defined as 
{A, B) = tr AB*. Let S = 5" be the family of C^^^-valued functions defined on the unit circle which are Hermitian, 
positive-definite, bounded and coercive. We have the following existence result ifTSl : There exists e 5 satisfying ([TJ if and 
only if there exists H e C™^" such that 

Y.- A1:A* = BH + H*B* (2) 

Paper 1281 deals with the following (scalar) spectrum approximation problem: When constraint ([TJ is feasible, find the spectrum 
$ which minimizes the KuUback-Leibler pseudo distance 



f ^ 



from an "a priori" spectrum subject to the constraint It turns out that, if the prior is rational, the solution is also 
rational, and with degree that can be bounded in terms of the degrees of G{z) and "if. This problem again admits the maximum 
differential entropy spectrum (compatible with the constraint) as a particular case (^'(ej'') = 1). The above minimization poses 
naturally a variational problem, which can be solved using Lagrange theory. Its dual problem admits a maximum and can be 
solved exploiting numerical algorithms. In IfTSl we restated and solved a similar variational problem with respect to a different 
metric, namely the Hellinger distance: 




W / V«'-V$ (3) 



Equation ([3]) defines a bona fide distance, well-known in mathematical statistics. The main advantage of this approach to 
spectral approximation is that it easily generalizes to the multivariable case, whereas log-like functionals do not enjoy this 
property C], EZJ, L24J, UIJ. 



III. Feasibility and the operator F 

In this section, we discuss in depth the feasibihty of ^. Following |28| and |171, let n{n) = {M e C"^" : M = 
M*}, let C(T;7Y(to)) be the space of 7Y(m)-valued continuous functions defined on the unit circle, and let the operator 
r : C(T; H{m)) H{n) be defined as follows: 

r($) := J G^G* (4) 

We are interested in the range of the operator F which, having to deal with Hermitian matrices, we consider as a vector space 
over the reals. 

Proposition 3.1: The following facts hold: 

1) Let S = S* > 0. The following are equivalent: 

. There exists e C™""" such that identity ^ holds. 
. There exists $ e 5™^"(T) such that / G$G* = E. 
. There exists $ e C(T; 7i(m)), $ > such that r($) = E. 

2) Let S = E* (not necessarily positive definite). There exists H e (^mxn ^^^j^ jj^^j. jjjgjj^j^y ^ holds if and only if 
E e Range F. 

3) X e Range F-L if and only if G*(ej'')XG(ej'') = Vz9 G [0,27r]. 
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Proof: 

As stated above, it was proved in fTsl that there exists H e such that identity ^ holds with Hermitian and positive 

definite S if and only if E = J G^G* for some $ e 5™^™(T), > 0. A similar result, albeit with a different algebraic 
formulation of the feasibility condition, was proved in ifTTl Proposition 2.1]. The proof of Fact 1 is straightforward once we 
note that the "if" part of the proof of [TT, Proposition 2.1] is constructive, and exhibits a continuous spectrum. Hence, the fact 
that there exists a spectrum $ such that S = J G^G* is equivalent to there exists a continuous spectrum such that the same 
holds. 

As for the second assertion, let S G Range F. Then there exists $ e C{T;H{m)) such that 



G$G* = J G($+ -$_)G* 
= / G$+G* - / G$_G* = S, - S_ 



where <!>+ and <&_ are two spectra such that <i>4. — <&_ = $ (they can be chosen to be bounded away from zero) and where E+ 
and E_ are symmetric positive definite. Hence S is a difference of positive matrices for which (j2]l holds. This establishes Q 
for S itself. Vice versa, suppose that (|2]) holds for an Hermitian S. Let be the unique solution of the following Lyapunov 
equation: 

S„ - AY.aA* = B (aB*) + (aB*)* B* = 2aBB* 

where a e M. Then depends linearly upon a, i.e. Eq, = aEi, where Si > since {A, B) is reachable. Thus, there exists 
an a such that > and Sq, > E. Let E_ = — S. Then S_ > 0, and since Q holds for S and E^, it also holds for S_. 
Then assertion 1 impUes that there exist <i>a > and $2 > in C(T; H{m)) such that Sq = / G(1>qG* and S- = / G(1>2G*, 
hence S = / G(<i>Q — <i>2)G* and assertion 2 follows. 

The third assertion is a simple geometrical fact: If X e Range F-'-, then for any $ G C(T; 7i(m)) 



{)= (x, j G$G* ^ = tr X y" G$G* = tr ^ (G*XG)$ 



and the conclusion follows. □ 

Remark 3.2: The underlying statement in Proposition |3.1[ Facts 1 and 2, is that if we defined F over the vector space of 
finite linear combinations of functions in iS™^'"(T), its range would remain the same. 



Remark 3.3: Proposition 3.1 shows that Range F is the set of all the Hermitian matrices E for which there exists H such 
that Q holds. This fact will be useful in numerical computations. Indeed, Range F is obviously finite-dimensional, and if 
{Hi, Hn} is a base of C™^", then the corresponding solutions {Ei, Eat} of (j2|, considered as a discrete-time Lyapunov 
equation in the unknown E, generate RangeF. Note that {Ei, ...jEat} are not necessarily linearly independent. 

IV. MULTIVARIABLE SPECTRUM APPROXIMATION IN THE HELLINGER DISTANCE 

Let the function dn ■ 5™''™(T) x ^"''"(T) ^ M+ be defined as follows: 

dni'i, ■■= inf tr / {W^ - W^) [W^ - W^)* 

Win , Wis. J 

s. t. W^W^ = W^W^ ^ 

Wij, and of dimension m x m 

that is, dni^i', ^) is the distance between the sets of square spectral factors of the two spectra. It was shown in ifTTl that 
the infimum in (jSj is actually a minimum and that dn is a bona fide distance between spectral densities, and reduces to the 
ordinary Hellinger distance in the scalar case. It was also shown there that the minimum in (|5]l is the same if we fix a square 
spectral factor W5, of "if, and then minimize over the spectral factors of <&: 



= mintry" {W^ - W) {W^ - W)* 



(6) 

s. t. WW* = $ 

The multivariable spectrum approximation problem addressed in ifTTll is the following. Let G(z) = {zl — A)~^B, where A is 
stable, B has full rank and {A, B) is reachable. Given G 5, find 

argmind^r*.*) s. t. / G$G* = S (7) 
*e5 ' J 
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Since S > 0, applying the following change of base to G{z): 

G{z) = (zl - A)-^B = E-i/2G(z), 
it is easy to see that there is no loss of generality in taking E = /. Thus, once a factor W% of is fixed, (j7]i reduces to: 

argmintr / {W^ - W) {W^ - W)* 

^ (8) 
s. t. / GWW*G* = I 



which is an constrained minimization. 

Remark 4.1: In 117. Theorem 6.1], it was shown that the minimizer in (|6]l is explicitly given by 

Nevertheless, to solve the approximation problem (|8]l, we do not need to employ (j9]l (see below). 
A. Variational analysis 

Now let us assume that the problem is feasible, i.e., condition (j2]l holds. To solve ([8]l, form the Lagrangian: 



L{W, A) = tr I (VF* - W) {W^ - W) 

(10) 

A, / GWW*G* -I' 



where A G 7i(n). Since J GWW*G* S Range F by construction, and / e Range F by the feasibility assumption, it is natural, 
though not strictly necessary, to restrict a priori the Lagrange parameter A to Range F (a A e Range F-'- would not play any 
role in the above Lagrangian). 



We proceed with unconstrained minimization of (lOi. The functional (lOi is convex and differentiable in W. Thus, to find 



the unique minimizing solution we impose that the first variation of (lOi is zero in each direction SW. We easily find the 
following condition for W (see ifTTl for the details): 

W -Wq, + G*AGW = 

To carry on with the computations, and to ensure that the resulting optimum spectrum is integrable over the unit circle, we 
require a posteriori that A belongs to the following set: 

C" = {AeH{n), / + G*AG > V ej'' e T} (11) 

that is, A e , where 

:=RangeFn/:^. (12) 

If this is the case, the optimal spectral factor and the corresponding optimal spectral density are easily found to be: 

(I + G*AG)-^W^, 

* , (13) 

$ = (/ + G*AG)-^^{I + G*AG)~^ 

Remark 4.2: Observe that, when is rational, ( [T3| yields a rational spectrum with McMillan degree that can be bounded. 
The same applies to scalar spectrum approximation problem in a KuUback-Leibler type distance where the degree of the 
optimal approximant is actually lower [28 1. In the multivariable case, however, the Kullback-Leibler solution is computable 
and of bounded McMillan degree only when ^ — I (maximum entropy solution), see Il22l Theorem 1], ll24l Section 4]. 

Consider now the issue of existence of a matrix A e such that 



j G{I + G*AG)-H{I + G*AG)-^G* = I 



(14) 



that is, such that the corresponding optimal spectrum satisfies the constraint ([T]i. A key result of ifTTl Theorem 7.7], inspired 
by a fundamental result of Byrnes and Lindquist f?!, states that such a A indeed exists and is unique, therefore establishing 
that for such A ( [T3| ) is the solution to problem (|8]l. 

Remark 4.3: Identity ( [T4| is attained at A = if and only if itself satisfies the contraint ([T]i. The "only if" part is trivial. 
As for the "if" part, let J G^G* = /. Then substituting A = into we obtain the spectrum $ = ^P, which has the least 



DRAFT 



5 



possible distance dn = from and hence is automatically optimal, and trivially satisfies ( 14 1. The assertion follows from 
the uniqueness of A. 

In order to find the optimal A, we form the dual functional (see P31): 

Ld{A) = L{W, A) =tr J - {I + G* AG)-^^) - tr A (15) 

Notice that Ld(A) is finite on ■ Recall that the dual of a Lagrangian functional is always concave and that a finite convex 
(or concave) function defined on a finite dimensional space is continuous on the interior of its domain (see pT) or |[37|). 



Instead of maximizing ( 15 i, we consider the equivalent problem of minimizing the following functional; 

J*(A) = -Ld(A) +tr J^ = tr J {I + G* AG)-^^ + tr A (16) 

The minimization of the convex and continuous functional over is the main subject of this paper The following 
sections are dedicated to prove strict convexity and smoothness of J<s,, to describe a Newton-type algorithm for its numerical 
minimization, and to prove the global convergence of that algorithm. Some numerical simulations follow. 

V. Properties of the functional 

In this section, we establish various properties of the functional J>i, on £^ . We begin by recalling a few basic definitions 
and facts from multivariate analysis. A function f : S C M*^ is (Frechet) differentiable on the open set S if for all 

X G S there exists a linear map : — > M*^ such that 

\\f(x + h)-fix)^L,ih)\\ ^ ^ 
h^O \\h\\ 

A function / is said to be C'^(S') if it is continuous on S. Also, / is said to be C^{S) if it is differentiable at each x G S and 
if the operator Df defined by 

Df{x) 

is G'^{S). Now the derivative Df : S i(M^,M*^) ~ M^^^ is itself a function between finite-dimensional spaces. If Df is 
G^{S), then / is said to be G'^{S). Proceeding in this way, the C'^'-differentiability of / can be defined. Finally, / is said to 
be of class G°°{S) if it is G'^{S) for all k E N. A standard result in analysis states that / e G^{S) if and only if the partial 
derivatives (where is the m-th component of /) exist and are continuous on S (see for instance |38 , Theorem 9.21]). 
It follows that f & {S) if and only if / has in S continuous partial derivatives of any order up to k, that is: 

ah f 



for all m,ni^h s.t. 1 < m < M, I < rii < N, and < h < k. 

In our setting, where : C Range F M, the role of the above partial derivatives is played by the directional (or 
Gateaux) derivatives, 

(5''Jm,(A;(5A„,,...,(5A„J 

where 1 < ni < d and {SAi, ...,SAd} is a fixed orthonormal base of RangeF. A fortiori, if we show that J* has on 
continuous directional derivatives of any order up to k, taken in whatever directions {SAi, ...,6Ak} C RangcF, then we can 
say that ,U e G''{C^). 

Lemma 5.1: Let H G 7i(ri) and m be its minimum eigenvalue. The map H m is continuous. 
Proof: 

The map from a matrix H to the vector of coefficients of its characteristic polynomial a{s) — det(s/ — H) — qq + ... + 
a„_is"^^ + s" is continuous. Indeed, each of the coefficients of a{s) is obtained by means of sums and products of elements 
of H. Moreover, it is a well-known fact (see for example 1341 ) that the mapping from the coefficients of a monic polynomial to 
its roots is continuous, in the following sense: Given a(s) = s" + X]"=o^ aiS^, let be the zeros of a{s) and ui the respective 
multiplicities. For all e > 0, there exists 6 > such that if b{s) = s" + J^^Zq biS^ and |&i — a^l < (5 for alH = 0, 1, . . . , n — 1, 
then h{s) has Vi zeros in the ball centered in A,; with radius e. In conclusion, if H is Hermitian, the mapping from H to its 
minimum (real) eigenvalue is continuous. □ 

Lemma 5.2: Define Qa{z) — I + G* {z)AG{z). Consider a sequence A„ e converging to A e ■ Then are well 
defined and continuous on T and converge uniformly to Q']^ on T. 
Proof: 
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Observe that, for A G C^, Qa is a positive definite, continuous matrix function on T. By Lemma 5.1 there exists a 
continuous function mA(c^'') > such that QaIc^^) > m-Aic^^)! for each i). Hence, QA{<i^^) > rnt^I^y-d, where ttia '■— 
min^ m[y{e^'^) > 0. Let 6 A e 5(0, e), the closed ball of radius e centered in 0. Now, 

||G*(ej'')MG(ej'')|| < ||<5A||Mg < eMa 

where 

MG-max||G*(ej'')|| ||G(ej^)||. 

Thus, if we choose e < mj~^/Mc, then ||G*(ej'')(5AG(ej'')|| < m^. Hence, /+ G*(A + 5k)G describes, as {5h,-d) varies 
in 5(0, e) x [— tt.tt], a compact set that does not contain any singular matrix. Now recall that the matrix inversion operator 
is continuous at any nonsingular matrix. Hence, Qa+5a(^^'') admits a uniform bound Af (A, e) on _B(0,e) x [— 7r,7r]. Since 
A„ A, for n sufficiently large, (A„ - A) e BiQ.e). Then 

sup WQll - Ql^W = sup WQll [G*(A - A„)G] Q^'ll 
< M2sup||G*(A- A„)G|| 



This implies that Q^^ uniformly on T. □ 

Theorem 5.3: Consider J^, : C Range F M. Then 

1) J* € G°°(£f). 

2) J* is strictly convex on . 
Proof: 

Let I : A^^ be the matrix inversion operator. Making use of 

6I{A; SA) = -A-^6AA-\ (17) 
the first variation of Jqi{A) in an arbitrary direction 6Ai is found to be: 



(5J>t(A;(5Ai) ^ -tr J Q^^G* SAiGQ^^'i' + tr 5Ai 
^1^1- j GQl^Ql^G*, Ml 



(18) 



The linear functional V J*^a(') '■— SJ^,{A; •) defined by (18 i is the gradient of J* at A. To prove that J* G G^(£^) we must 
show that, for any fixed 6Ai, (5J*(A; 6Ai) is continuous in the variable A (it follows that V Jii,.a(-) is also continuous in A). 
Consider a sequence Af„ g Range F converging to 0. By Lemma 5.2 Qa+m converge uniformly to Q^^- Recall that ^ is 



bounded. Applying elementwise the bounded convergence theorem, we get 

lim / , G*<5AiGQ7i vi/^ / Q-^G*6A,GQl'^ 



Hence, for all SAi E RangeF, SJis{A; SAi) is continuous, i.e. J* e G^(£^). The second variation of J^, say in direction 
5A2, is easily obtained applying ( [T7| and the chain rule to (I81: 



6^J^{A; (5Ai,(5A2) 

= tr J w;,Ql'G*SA2GQl'G*dAiGQl^W^ ^^^^ 
+ tT J W^Ql^G*SAiGQl^G*dA2GQl^W^ 
The bilinear form Ha{-, ■) ■— (5^ J>i>(A; •, •) is the Hessian of J<s, at A. Again, continuity of (5^ J,f (A; SAi, 5A2) can be established 



by the previous argument in view of Lemma 5.2 Similarly, it can be shown that has continuous directional derivatives of 
any order. Thus e C''{£p) for any k, and the first assertion follows. Finally, we show that is strictly convex on . A 
standard result in the theory of convex functions states that if a function / : 5 C M is G^(S') (where S is open), then / 

is strictly convex on S if and only if its Hessian is positive definite at each x E S. Consider Ha{SA, SA) = S^J^i{A; 6A, 6A) 



for 6A E RangeF \ {0}. Since the integrand in ( 19i is positive semidefinite, it follows that Ha{SA, 5 A) > 0. In view of Point 



3 in Proposition 3.1 the integrand is not identically zero and H\{6A, SA) > 0. It follows that Jq, is strictly convex. □ 



Loosely speaking, ifTTI establishes existence of the minimum by showing that J5, is bounded from below and that J^{A) — > 
-00 whenever ||A|| — > +00 or A approaches dCp, the boundary of ■ Since the minimum point cannot reside at infinity 
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or at the boundary, we can then restrict the minimization problem to a sublevel set of Jif. It follows from the continuity of 
that such a set is compact, and the existence result follows from Weierstrass' theorem. Since J,p is strictly convex, the 
minimum point is unique. 

VI. A MATRiciAL Newton algorithm 

A. Description of the iterative method 

The Newton algorithm is an iterative procedure for the search of roots of a function or the minimization of a functional. 
With respect to the latter objective, it can be formulated as follows. Let / : 5 ^ M be a functional defined over S C M". In 
order to find an estimate x of a minimum point x* of /, 

1) Make an initial guess xq, possibly near the minimum point. 

2) At each iteration, compute the Newton step 

Ax, = ~H-yU^ (20) 

where H^. is the Hessian of / at Xi and Vf^^ is the gradient of / at Xi (understood as a column vector). 

3) Set t° = 1, and let = if /2 until both of the following conditions hold: 

x^ + t'^Ax,eS (21) 
fix, + t^Ax,) < fix,) + at'yVf^Ax, (22) 

where a is a real constant, < a < 1/2. 

4) Set Xi+i ^ Xi + t^Axi. 

5) Repeat steps 2, 3 and 4 until IV/^:-! < e, where e is a (small) tolerance threshold, then set x — Xi. 

In its "pure" form, the iteration of the Newton algorithm only consists in step 2, which is indeed its essential part. Step 3 is 
the so-called backtracking procedure. For small t, iff is sufficiently regular, we have /(x^ + tAxi) ~ fi^i) + tWfjAxi. 



Since WfJ.Axi = ^^fj H^.^W fx^ < 0, condition (22 1 must hold for small t, hence step 3 must terminate at some iteration. 
Since W fJ.Axi < 0, (22 1 implies fixi + t^Axi) <j{xi)- Th^^ is, {/(xi)} is a strictly decreasing sequence. 

In essence, the "pure Newton algorithm works very well when the starting point happens to be near the minimum and the 
function / is there effectively approximated by a quadratic form, but it can suffer from numerical problems when this is not 
the case. The backtracking line search is a remedy to this drawback; moreover it can be shown that, under certain regularity 



assuptions on /, which hold in our case (see Section VII i, after a finite number of iterations step 3 always selects the multiplier 
t — \, that is, the full step. During the latter stage, the convergence to the minimizing solution is quadratic, meaning that 
there exists a constant C such that \ \xi^i — x*\ \ < C\\xi — x*]]"^ . This rate of convergence makes the Newton algorithm often 
preferable over other minimization methods (see ITOl ). We must minimize the functional Jif (A) over the set As initial 
condition, we can safely choose 0. Hence, set 

Ao - 0. (23) 

It turns out that, although the problem is finite-dimensional, the inversion of the Hessian is more demanding than inverting a 
matrix. In order to compute the Newton step AA^, we must solve at A^ the following linear equation: 

i7A.(AA„-) = -VJ*,A.(-) (24) 



where, once fixed A^, VJ* a. (•) and H\.i-,-) must be understood as a linear and a bilinear form, defined by (18 1 and (19 1 
respectively. Comparing with the above definitions, ([24| reduces to: 



fcQl' [(G*(AA,)GQa'^) + (G*(AA,)Ggx'^)*] Q^'G* 

(25) 



In principle, equation (25 i is not difficult to solve. We suggest the following procedure: 

• At the beginning of the procedure, take a base {Hi, Hk, Hjy} of C^^'^I^Then compute the solutions {Ei, E^, Ejv} 
of the following discrete-time Lyapunov equations: 

^k~AJ:kA*^BHk + H;B* 

As shown before, these solutions generate Range F. 

• To compute AA, at each step, 

' Actually, it suffices to take the {Hf^} to be a base of C™^" © Ker R. where the map R is defined by 

R: H ^ BH + H*B*. 
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1) Compute the integral 



J GQl]^Ql]G* - I 
2) For each E^, in the precomputed generators, compute the following integral: 



3) Solve, by means of linear algebraic methods (the Moore-Penrose pseudoinverse), the equation 

^kYu = Y 



4) By linearity, the solution to ([25]l is 



(26) 



(27) 



(28) 



(29) 



It is clear that the real difficulty here is the computation of the integrals (26i and (27 1. This task requires extensive use of 
the following results of linear stochastic systems theory. 

Lemma 6.1: Let A be a stability matrix and W{z) = C{zl — A)^^ B + D a minimal realization of a spectral factor of ^{z). 
Let n be the unique solution to the Lyapunov equation 



n = AHA* + BB* 



(30) 



Then the following hold: 

1) jZ^^ie^-^)'^ = cue* + DD*. 

2) Z{z) = C{zl - A)-\AnC* + BD*) + \{CTIC* + DD*) is a realization of the causal part of $(z); that is, Z{z) is 
analytic outside the unit circle and ^{z) ~ Z{z) + Z*{z). 

Lemma 6.2: Let Z{z) — C{zl — A)^^G + |S be a minimal realization of the causal part of a spectrum <i>(z). Let P_ be 
the stabilizing solution of the following Algebraic Riccati Equation (ARE): 



P = APA* + (G - APC*){Y. - GPG*)-\G* - CPA*) 



(31) 



Let moreover D = (E - GP_G*)i/2 and B = [C - AP_C*)D-^. Then W{z) = C{zl - A)-'^B + D is the minimum phase 
spectral factor of <i>(z); that is, W{z) is stable and with stable causal inverse, and $(2) = W{z)W*{z). 
Lemma 6.3: Let F{z) = C{zl — A)~^B + D be a square transfer function, where D is invertible. Then 



F-\z) ^ -D~^C {zl - {A- BD-^C)) ^ BD' 



D- 



(32) 



is a realization of its inverse. 

Lemma 6.4: For all matrices p = p* ^ C"X" 



the following identity holds: 

[ B*{z~^I ~ A*y I ] 





A*PA-P A*PB 
B*PA B*PB 



{zI-A)-^B 
I 



(33) 



Lemma 6.5: Let A be a stability matrix and H{z) — C{zl — A) + D be a minimal realization. Let P be the solution 
of the Lyapunov equation 

P = A*PA + C*C. (34) 



Let 



be an ortho-normal basis of ker [A*P^/^ C*] i.e. 



[^*pl/2 C*] 



[K* J*] 



Let G := p-^/^K and 



Hi{z) := {D* C + B* PA){zI - Ay^G + B* PC + D* J. 



(35) 



(36) 



Then, H*{z)H{z) = Hi{z)Hl{z). 

Lemmas |6.1[ |6.2| and |6.3| are standard results (see for example lfT4l ). The proofs of Lemmas 6.4 and 6.5 can be found in 
IIT2I Appendix A] and 1171 Appendix A], respectively. 
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Remark 6.6: Lemma |63] not only gives us a tool to compute a left factor from a right factor of a given spectrum. It also 
works in the opposite direction. Indeed, let W{z) = C{zl — A)~^B + £> be a minimal realization, and let ( = z~^. Then 

$(2) = W{z)W*{z) 

= {C{zl - Ay'^B + D){B^{z-^I - A'^y^C^ + D^) 

= {B^iC'^I - A^Y^C^ + D^y(B^(C,I - A'^y^C^ + D^) 

= (B^(C/ - A'^y^C^ + D^)'(B^(C/ - A'^y^C^ + D^) 

Applying Lemma [O] we can find an Hi{C) ^ H{CI - Py^G + K such that H*{C)H{C) ^ Hi{C)Hl{0- Now turning back 
to z: 

= {H{CI - Py^G + K)(G'^{C^I - F^y'H^ + K'^) 
= (G^(2"'J - F^y^H^ + K^y{G^{zI - F^y^H^ + K^) 
= (G^(zJ - F^y^H^ + K^y{G'^{zI - F^y^H^ + K^) 
= W^{z)Wi{z) = $(2) 

B. Factorization of (z) 

The first problem to solve is to obtain a spectral factor of Q^^{z), where Q\{z) — {I + G* {z)AG{z)). To this end, note 
that 

" A 



/ 



{zI-Ay^B 

I 



(37) 



Applying lemma 6.4 we can rewrite pT] ) as 

Qa{z) 



Now, the following linear matrix inequality: 



A*PA-P + A A*PB 

B*PA B*PB + I 

{zI-Ay^B 

I 



A*PA-P + A A*PB 
B*PA B*PB + I 



(38) 



N* 



[ M ] > 



is solvable for P — P* > if and only if such is the following ARE: 

P = A* PA - A*PB{B*PB + iy^B*PA + A 



(39) 



(40) 



The stabilizing solution P of (40l gives a realization for the square, minimum phase co-analytic spectral factor of Q{z). We 
have: 

N ^ N* = {B*PB + iy/^ 
M = {B*PB + iy^/'^B*PA 

[zI-Ay^B ' 

I \ (41) 

= {B*PB + iy^/'^B*PA{zI - Ay^B 

+ {B*PB + iy/^ 

Qa(z) = Aa*(z)Aa(z) 
and finally Q~^^{z) — A\~^{z)Aa^* (z) where, by means of lemma 6.3 



Aa(z) 



M N 



Aa~\z) = -{B*PB + iy^B*PA{zI - ry'^Bx 

X {B*PB + ly^i"^ + {B*PB + ly^i"^ 

T = A- B{B*PB + iy^B*PA 



(42) 
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C. Computation of the integrals in \26\ and \27\ 



By virtue of Lemma 6.5 and Remark 6.6 we can switch from a right factorization of a spectrum ($ = H*H) to a left 
factorization ($ = M^Ty*), and vice versa. We will now show that both ( [26| and ( pT] ) can be reduced to integrals of the form 

(43) 



G{z)^!C\z)^{z)^K-*{z)G*{z) 
where <i>(z) is a spectrum. Indeed, let <i>*(z) — Aa *(^)^'(2:)Aa~^(^;). Then 

j G{z)Ql\z)^{z)Ql\z)G*{z) 

= j G(z)Aa-^(z)(Aa"*(z)vI/(z)Aa-1(z))Aa-*(z)G*(z) 



(44) 



which has the form (43 i with = <1>^. Applying Lemma 6.5 we obtain a (left) spectral factor of (l>>i,(z): 

= Aa-*{z){W^,{z)WI{z))A^-\z) 
= ^A-*{z){Hl{z)H^{z))^j,-\z) 
= {H^{z)A^-\z)r{H^{z)Aj,-\z)) 
= Wi{z)W^{z) 



(45) 



Finally, (44i can be computed obtaining a realization of G(z)Aa ^{z)Wi{z) and applying Lemma 6.1 Now, let $s(z) 
Aa^* {z)G* {z)T.G{z)A\^^{z), where S is one of the precomputed generators of Range F. Then 

J GQl^ [{G*Y.GQlH) + (G*SGQX'*)1 Ql^G* 

GAa^^Aa^* [(G*EGAa"^Aa"*«') 

+ (^'Aa"'Aa"*G*EG)] Aa~'Aa"*G* 
GAa"' [(Aa-*G*SGAa-'Aa-**Aa-') 

+ (Aa"**Aa"'Aa"*G*SGAa"')] Aa"*G* 
GAa"^ [*s** + *>]/*s] Aa"*G* 

GAa"^ [(^-s + «'*)(«'s + 

- - ^'s^'e] Aa"*G* 

GAa"' + + Aa"*G* 

- y GAa^' Aa"*G* - y GAa"' [$s$s] Aa~*G* 

which is a difference of integrals of the form (43 1. To compute (46 1, we must obtain (left) spectral factors of $$<l>,p*, 
and ($5] + $*)(<I's + ^ip)*- Suppose, first, that S > 0. For the first spectrum we have 



(46) 



For the second, we have 



*s = (Aa-*G*e1/2)(e1/2gAa-1) = H-^H^ = W^W^ 



(47) 



(48) 
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And for the third: 

+ **)(*s + 

= (Zs + + + Z*)(Zs + Z^ + Zi + ZD* 
= ((Zs + Zi) + (Zs + Zi)*)((Zs + Zi) + (Zs + Zi)*)* 
= (Zis + ZJ'j^)(Zis + Zl^)* 
= W^is(VK*sW^is)M^rs 

= (T^isi/is)(Vt^isifiE)* 

where Zi is the causal part of (f>>j, Zis = Zi + Zs, VFis is a left factor of the spectrum Zis + ZJ'j^, Hi^. is a left factor of 



(49) 



the spectrum W^i*5]Wis, and where we used Lemma 6.1 to obtain the causal part of $s and from their spectral factors, 
and Lemma |6.2| to obtain the minimum phase spectral factor of the sum (l>s + from its causal part. Thus, if S > 0, we 
really have all the tools to compute integral ( [46] l. 

Now, E is not necessarily positive definite, but if —A < is the minimum between the eigenvalues of all the generators E^, 



then S + (A + 1)/ is positive definite. Thus, in the general case, by linearity (46 1 can be reduced to: 

= J GAa~^ [<i>s+(A+l)/-(A+l)/ *^>* 

+ <1>^ $S+(A+1)/-(A+1)/] Aa~*G* 

= J GAa"^ [$s+(a+i)/ + *s+(A+i)/] Aa"*G* 
- (A + 1) J GAa"^ [*/ + $/] Aa"*G* 



(50) 



which is a difference of integrals with the same structure of (46 1, and that are computable with the above tools (obviously 
/ GAa^ + $/] Aa *G* needs to be computed only once). This enables us to solve equation (24i. 



D. Computations in the backtracking step 

The backtracking stage involves similar, though easier, computations. We must check the following conditions: 

A, + i*^AA, e 

J3,(A,: + i.^AA,) < J* (A,;) + at'^iV Jq, AA^i 



(51) 
(52) 



Checking (51 1 is really a matter of checking whether we can factorize / + G*(Ai + t^AKi)G. Thus must be halved until 
the ARE (1^1 is solvable having A = A^ + t^AKi. 

Finally, to check (52i, we need to compute J^. This can be done in a way similar to the above computations: 

J^{K) = tv j {I + G*KG)-^-^ + tvK 





i Aa-'Aa~*W^*M^;4 


- tr A 




1 Aa-*(W^*M^;)Aa-' 


+ trA 


="J 


I Aa-* {H*^H^) A A-' 


+ tr A 


^"j 


f (H^AA-^riH^AA- 


i) + trA 




[ WW* +tr A 





(53) 



VII. Proof of global convergence 

Given that the minimum of exists and is unique, we investigate global convergence of our Newton algorithm. First, we 
recall the following 

Definition: a function f{x) twice differentiable in a set S is said to be strongly convex in S if there exists a constant m > 
such that H{x) > ml for x £ S, where Ii{x) is the Hessian of / at x. 
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We restrict our analysis to a sublevel set of J^. Let Aq = 0. The set 

S -.^ {AeC^ : J*(A) < Jv[,(Ao) = tr / (54) 

is compact (as it was shown in flTl Section VII]). Because of the backtracking in the algorithm, the sequence J*(Ao), J^((Ai), ... 
is decreasing. Thus A„ G S*, Vn > 0. We now wish to apply a theorem in (TU., 9.5.3, p. 488] on convergence of the Newton 
algorithm with backtraking for strongly convex functions on M". This theorem ensures linear decrease for a finite number of 
steps, and quadratic convergence to the minimum after the linear stage, thus establishing global convergence of the Newton 
algorithm with backtracking. We proceed to establish first strong convexity of Jq, on S. To do that, we employ the following 
result. 

Lemma 7.1: Let /(x) be defined over an open convex subset Z? of a finite-dimensional linear space V . Assume that / is 
twice continuously differentiable and strictly convex on D. Then / is strongly convex on any compact set S C D. 
Proof: 

First, recall that since / is twice continuously differentiable and strictly convex, its Hessian is an Hermitian positive- 
definite matrix at each point x. By Lemma [5T| the mapping from H to its minimum (real) eigenvalue is continuous. It follows 
that the mapping from x to the minimum eigenvalue of the Hessian of / at x is also continuous, being a composition of 
continuous functions. Hence the latter admits a minimum m in the compact set S by Weierstrass' theorem. Thus m is the 
minimum of the eigenvalues of all the Hessians computed in S, and m cannot be zero, since otherwise there would be an x 
with singular, and this cannot happen since / is strictly convex. Hence > ml Vx G S, i.e. / is strongly convex on S. 
□ 

Remark 7.2: By an argument similar to that of Lemma |5.1[ it can be shown that for a twice continuously differentiable 
function which is strictly convex on D, there exists AI > such that < MI for all x £ S*. Moreover, strong convexity on 
a closed set S implies boundedness of the latter Thus, strong convexity and boundedness of the Hessian are intertwined, and 
both are essential in the proof of Theorem 7.3 (see ifTOl ). 



Theorem 7.3: The following facts hold true: 

1) Jif is twice continuously differentiable on S; 

2) is strongly convex on S; 

3) the Hessian of is Lipschitz-continuous over S; 

4) the sequence {Ki]i > 0} generated by the Newton algorithm of Section V (23i-(52i converges to the unique minimum 
point of J* in £^ . 

Proof: 



Property 1 is a trivial consequence of Theorem 5.3 To prove 2, remember that is strictly convex on , hence also on 



S, and apply Lemma [tTT] As for property 3, what it really says is that the following operator: 



is Lipschitz continuous on S. Theorem 5.3 implies that e C'^{Cp) or, which is the same, that H e C^{C^). The continuous 
differentiability of H implies its Lipschitz continuity over an arbitrary compact subset of , hence also over the sublevel 
set S, and property 3 follows. Finally, to prove 4, notice that all the hypotheses of |10, 9.5.3, p. 488] are satisfied. Namely, 
the function to be minimized is strongly convex on the compact set S, and its Hessian is Lipschitz-continuous over S. It 
remains to observe that is defined over a subset of the linear space Range T which has finite dimension d over M (recall 



that Ranger is spanned by a finite set of matrices. See Proposition 3.1 and Remark 3.3 where d < N). Thus, once we choose 
a base in Range F, to every A e £^ there corresponds a vector in 18^, to every positive definite bilinear form over Range F 
there corresponds a positive definite matrix in M''^'^, and to every compact set in £^ there corresponds a compact set in 
M''. Hence, every convergence result that holds in M."^ must also hold in the abstract setting, in view of the homeomorphism 
between one space and the other. □ 



VIII. Application to spectrum estimation 

A. A spectral estimation procedure 

Following the purposes of the THREE method presented in ||9l, now we describe an application of the above approximation 
algorithm to the estimation of spectral densities. Consider first the scalar case, and suppose that the finite sequence yi, i/jv 
is extracted from a realization of a zero-mean, weakly stationary discrete-time process {yt}t^oo- want to estimate the 
spectral density ^y{e^^) of y. The idea is the following: 

• Fix a transfer function G{z) — {zl — A)^^B, feed the data {yi} to it, and collect the output data {xi}. 

• Compute a consistent, and possibly unbiased, estimate S of the covariance matrix of the outputs {xi}. Note that some 
output samples xi, ...^xm should be discarded so that the filter can be considered to operate in steady state. 

• Choose as "prior" spectrum $y a coarse, low-order, estimate of the true spectrum of y obtained by means of another 
(simple) identification method. 
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• "Refine" the estimate by solving the approximation problem ^ with respect to G{z), S = S, and = ^y. 
To be clear, the result of the above procedure is the only spectrum, compatible with the output variance S, which is closest 
to the rough estimate <ty in the dn distance. Note that we are left with significant degrees of freedom in applying the above 
procedure: The method for estimating <ty, in particular its degree, and the whole structure of G{z) = {zl — A)^^B, which 
has no contraints other than A being a stability matrix and {A, B) being reachable. 

The coarsest possible estimate of ^y is the constant spectrum equal to the sample variance of the {yi], i.e. ^y{e^'^) = 
cFy, where = j^^i X^ili Ij/iP- The resulting spectrum has the form (7^(1 + G*AG)^^. Another simple choice is <lj, — 



W{z)W*{z), where W{z) 



£(£i 

'a(z) 



is a low-order AR, MA or ARMA model estimated from yi, y^v by means of predictive 
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error minimization methods or the like. 

The flexibility in the choice of G{z) is more essential, and has more profound implications. As described in ||9l, ESl . ifTSll 
and ifTTl . the following choice: 



(55) 



where the pi\ lie inside the unit circle, implies that the (true) steady-state variance E has the structure of a Pick matrix, and the 
corresponding problem of finding any spectrum that satisfies ([T]) is a Nevanlinna-Pick interpolation. Moreover, the following 
choice: 



(56) 



implies that the steady-state variance E is a Toeplitz matrix whose diagonals contain the lags co,ci, ...,c„_i of the covariance 
signal of the input, and the corresponding problem of finding any spectrum that satisfies ([T]) is a covariance extension problem. 

These facts justify the theoretical interest in algorithms for constrained spectrum approximation, if for no other reason, as 
tools to compute at least one solution to a Nevanlinna-Pick interpolation or to a covariance extension problem, respectively. 
But the freedom in choosing G{z) has implications also in the above practical application to spectral estimation, where the 
key properties, not surprisingly, depend on the poles of G{z), i.e., the eigenvalues of A. In general, as described in f9l, 
the magnitude of the latter has implications on the variance of the sample covariance S: The closer the eigenvalues to the 
origin, the smaller that variance (see |[9j Section II.D]). Moreover, at least as far as THREE ||9l is concerned, the phase of the 
eigenvalues influences resolution capability: More precisely, the spectrum estimation procedure has higher resolution in those 
sectors of the unit circle where more eigenvalues are located. According to simulations, the latter statement appears to be true 
also in our setting (the fundamental difference being that the metric which is minimized is the Bellinger distance instead of 
the Kullback-Leibler one). 

Remark 8.1: In the above setting E is a consistent estimate of the true steady-state variance. Although S must belong to 
Ranger as +oo (this being the case even if y is the sum of a purely nondeterministic process and some sinusoids, as 

in the simulations that follow), it is almost certainly not the case that S £ Range F when we have available only the finitely 
many data xm+1t--,xn. Strictly speaking, this implies that the contraint ([l} with S = S is almost always not feasible. It 
turns out that, increasing the tolerance threshold in its step 5, the Newton algorithm exhibits some kind of robustness in this 
respect. That is, it leads to a A whose corresponding spectrum $ is close to satisfying the constraint. 

Nevertheless, we prefer a clear understanding of what the resulting spectrum really is. Thus, we choose to enforce feasibility of 
the approximation problem, at least as permitted by machine number representation, before starting the optimization procedure. 
To this end, following the same approach employed in [9|, we pose the approximation problem not in terms of the estimated 
S, but in terms of its orthogonal projection Er onto Range F, which can be easily computed by means of algebraic methods. 
That is to say: We cannot approximate in the preimage F^^(S), because that set is empty, thus we choose to approximate in 
F^^(Sr), where Er is the matrix closest to E such that its preimage is not empty. This seems a reasonable choice and by the 
way it is, mutatis mutandis, what the Moore-Penrose pseudoinverse does for the "solution" x = A^b, when the linear system 
Ax — b is not solvable. 

Note that it is not guaranteed at all that the projection of a positive definite matrix onto a subspace of the Hermitian matrices 
is itself positive definite. In practice, this is not really a problem, inasmuch S is "sufficiently positive" and close to Range F. 
The positivity of Sp must anyway be checked before proceeding. This approach and the considerations on the positivity issue 
should be compared to |£i Section II.D], which deals with the particular case when Range F is the space of Toeplitz matrices, 
and to 1.27. Section 4], where, to find a matrix a Er close to E, a Kullback-Leibler criterion is adopted instead of least squares. 



DRAFT 



14 



B. Simulation results: Scalar case 




Fig. 1. Estimation of an ARMA(6,4) spectrum by means of Hellinger-distance spectrum approximation, constant prior and AR(3) prior 

Figure [T] shows the results of the above estimation procedure with G{z) structured according to the covariance extension 
setting (j56]l with 6 covariance lags (i.e. n 6, A is 6 x 6), run over 500 samples of the following ARMA process: 

y{t) = Q.by{t - 1) - QA2y{t - 2) + 0.602y(i - 3) 
- 0.0425y(i - 4) + 0.1192i;(i - 5) 
+ e{t) + l.le(t - 1) + 0.08e(i - 2) - 0.15e(t - 3) 

(poles in 0.9, — 0.2±0.7j, ±0.5j) where e{t) is a zero-mean Gaussian white noise with unit variance. Two priors, both estimated 
from data, have been considered: the constant spectrum ^y{e^'^) = and the spectrum — W ar{z)W'^j^{z) , where 
War{z) — is an AR model of order 3 obtained from the data by means of the Predictive Error Method procedure in 
Matlab's System Identification toolbox. 

Figure |2] shows the performance of the above procedure in a setting that resembles that of f9' Section IV.B, Example 1]. 
The estimation procedure was run on 300 samples of a superposition of two sinusoids in colored noise: 

y{t) = 0.5sin(wit + + 0.5sin(w2i + 02) + z{t) 
z{t) = 0.8z{t - 1) + 0.5j/(i) + 0.25i/(t - 1) 

with 01, 02 and i'{t) independent normal random variables with zero mean and unit variance, oji — 0.42 and aj2 = 0.53. The 
prior here considered is the constant spectrum equal to the sample variance of the {yi} data. Following |9|, A was chosen 
real block-diagonal with the following poles (equispaced in a narrow range where the frequencies of the two sinusoids lie, to 
increase resolution in that region): 

0,0.85,-0.85, 

0.9e±j"-42, 0.9e±j°-44, 0.9e±j"-46, 0.9e±j°-48, 0.9e±j°-^" 

(and B a column of ones). It can be seen that Hellinger-distance based approximation does a good job, as does the THREE 
algorithm, at detecting the spectral lines at frequencies uji and uj2- 
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Fig. 2. Spectral estimates of two sinusoids with superimposed noise by means of Hellinger-distance spectrum approximation, constant prior. Compare with 
(9] Section IV.B, Example 1]. 



C. Simulation results: Multivariate case 

We now consider spectral estimation for a multivariate process. Here, 100 samples of a bivariate process with a high order 
spectrum were generated by feeding a bivariate Gaussian white noise with mean and variance / to a square (stable) shaping 
filter of order 40. The latter was constructed with random coefficients, except for one fixed conjugate pair of poles with 
radius 0.9 and argument 0.52, and one fixed conjugate pair of zeros with radius 1 — lO^'"^ and argument 0.2. The transfer 
function G{z) was chosen with one pole in the origin and 4 complex pole pairs with radius 0.9 and frequencies equispaced 
in the range [0,7r]. Then the above estimating procedure was applied, with prior spectrum chosen as the constant density 
equal to the sample covariance of the bivariate process y. Figure p] shows a plot of <i>ii(ej''). Re <i>i2(ej''), Im<i>i2(ej'') and 
*I'22(e^''), respectively for the true spectrum and for the estimation of the latter based on one run of 100 samples. In Figure 
|4] we compare the performances of various spectral estimation methods in the following way. We consider four estimates 
^H, 4>ME, *&PKM._ and <1>N4SID of The spectral density <SE>h is the estimate obtained by the procedure described above in 
Subsection 



vm-A 



The spectral density ^me is the maximum entropy estimate |l22l| obtained using the same G{z) employed 
to obtain our estimate. The spectral densities ^pem and $n4Sid are the estimates of $ obtained by using "off-the-shelf" 
Madab procedures for the Prediction Error Method (see i.e. 1^ or 133]) and for the N4SID method (see ES) or 1331): The 
former is a multivariable extension of the classical approach to ARMAX identification, while the latter is a standard algorithm 
in the modern field of subspace identification. In order to obtain a comparison reasonably independent of the specific data set, 
we have performed 50 independent runs each with 100 samples of y. In such a way we have obtained 50 different estimates 
4m j, M = H, ME, PEM, N4SID, i = 1, 2, . . . , 50, for each method. 



We have then defined 



1 



50 



i—1 



(57) 



where 



denotes the spectral norm. This is understood as the average estimation error of our method at each frequency. 



DRAFT 



16 






im(Phi1 2), truG spectrum 
im(Phi1 2), estimate w/Hellinger 


1 1 1 1 1 1 




Fig. 3. Estimation of the spectrum of a bivariate process with rich dynamics by means of Hellinger-distance spectrum approximation, constant prior. 




Fig. 4. Estimation of the spectrum of a bivariate process with lich dynamics by means of various methods. Comparison between the spectral norm of the 
differences $h ~ "J?, 3'me ~ 'S'. "^PEM ^ "J", ^nd 'I'n4SID ~ ^ (average over 50 simulations). 



Similarly, we have defined the average errors Eme{'&), £^pem('!?), and £'n4Sid(i^) of the other methods. In the each of the plots 
of Figure [4j we depict the average error of our method Eji{d) together with the average error of one of the other methods. 
More explicitly, the first diagram shows the error for the Bellinger approximation method and for the maximum entropy 
spectrum described in |22|. The second diagram shows the error for the Hellinger approximation and for the spectrum obtained 
via MATLAB's PEM identification method. The third diagram shows the same for Hellinger approximation and MATLAB's 
N4SID method. The Hellinger approximation based approach appears to perform better or much better than the other methods. 
The simulation yields similar results with N = 200 data points. With N = 300 data samples, PEM and N4SID perform as 
well as our method. 

Of course, one should always take into account the complexity of the resulting spectrum. In this example, G{z) being of order 
9, the resulting spectral factor (or "model") produced by the Hellinger approximation has order 18, whereas the corresponding 
maximum entropy model has order 9 and both N4SID and PEM usually choose order 10. 

In our simulation, the norm of the difference of two estimates produced by PEM or by N4SID is sometimes very large when 
compared to the norm of the difference between any two of the estimates produced by our method. That is, although PEM 
and N4SID are provably consistent as — > oo, when few data are available both of them may introduce occasional artifacts, 
which are well visible as "peaks" in figure |4] (a "peak" in the 50-run average is due to a very high error in one of the runs, 
not to a systematic error). Our method appears to be more robust in this respect. 
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IX. Conclusion 

In this paper, we considered the the new approach to multivariate spectrum approximation problem with respect to the 
multivariable Hellinger distance, which was proposed in |17|. We developed in detail the matricial Newton algorithm which 
was sketched there, and proved its global convergence. Finally, we described an application of this approach to spectral 
estimation, and tested it against the well-known PEM and N4SID algorithms. 

It appears that approximation in the Hellinger distance may be a useful tool to gain insight into the dynamics of a multivariate 
process when fewer data are available. In particular, simulations suggest that this method is less prone to produce artifacts 
than PEM and N4SID. Another advantage of our method and of the maximum entropy paradigm is that a higher resolution 
estimate in a prescribed frequency band can be easily achieved by properly placing some poles of G{z) close to the unit circle 
and with phase in the prescribed band. 

Numerical robustness of the algorithm with respect to the number and the position of the poles is an open challenge. Also, 
the analysis of the achievable precision of the results (in a statistical sense) has still to be developed. 
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